20 research outputs found
Enhancing Network Initialization for Medical AI Models Using Large-Scale, Unlabeled Natural Images
Pre-training datasets, like ImageNet, have become the gold standard in
medical image analysis. However, the emergence of self-supervised learning
(SSL), which leverages unlabeled data to learn robust features, presents an
opportunity to bypass the intensive labeling process. In this study, we
explored if SSL for pre-training on non-medical images can be applied to chest
radiographs and how it compares to supervised pre-training on non-medical
images and on medical images. We utilized a vision transformer and initialized
its weights based on (i) SSL pre-training on natural images (DINOv2), (ii) SL
pre-training on natural images (ImageNet dataset), and (iii) SL pre-training on
chest radiographs from the MIMIC-CXR database. We tested our approach on over
800,000 chest radiographs from six large global datasets, diagnosing more than
20 different imaging findings. Our SSL pre-training on curated images not only
outperformed ImageNet-based pre-training (P<0.001 for all datasets) but, in
certain cases, also exceeded SL on the MIMIC-CXR dataset. Our findings suggest
that selecting the right pre-training strategy, especially with SSL, can be
pivotal for improving artificial intelligence (AI)'s diagnostic accuracy in
medical imaging. By demonstrating the promise of SSL in chest radiograph
analysis, we underline a transformative shift towards more efficient and
accurate AI models in medical imaging
Empowering Clinicians and Democratizing Data Science: Large Language Models Automate Machine Learning for Clinical Studies
A knowledge gap persists between Machine Learning (ML) developers (e.g., data
scientists) and practitioners (e.g., clinicians), hampering the full
utilization of ML for clinical data analysis. We investigated the potential of
the chatGPT Advanced Data Analysis (ADA), an extension of GPT-4, to bridge this
gap and perform ML analyses efficiently. Real-world clinical datasets and study
details from large trials across various medical specialties were presented to
chatGPT ADA without specific guidance. ChatGPT ADA autonomously developed
state-of-the-art ML models based on the original study's training data to
predict clinical outcomes such as cancer development, cancer progression,
disease complications, or biomarkers such as pathogenic gene sequences.
Strikingly, these ML models matched or outperformed their published
counterparts. We conclude that chatGPT ADA offers a promising avenue to
democratize ML in medicine, making advanced analytics accessible to non-ML
experts and promoting broader applications in medical research and practice
Federated learning for secure development of AI models for Parkinson's disease detection using speech from different languages
Parkinson's disease (PD) is a neurological disorder impacting a person's
speech. Among automatic PD assessment methods, deep learning models have gained
particular interest. Recently, the community has explored cross-pathology and
cross-language models which can improve diagnostic accuracy even further.
However, strict patient data privacy regulations largely prevent institutions
from sharing patient speech data with each other. In this paper, we employ
federated learning (FL) for PD detection using speech signals from 3 real-world
language corpora of German, Spanish, and Czech, each from a separate
institution. Our results indicate that the FL model outperforms all the local
models in terms of diagnostic accuracy, while not performing very differently
from the model based on centrally combined training sets, with the advantage of
not requiring any data sharing among collaborators. This will simplify
inter-institutional collaborations, resulting in enhancement of patient
outcomes.Comment: Accepted for INTERSPEECH 202
Collaborative Training of Medical Artificial Intelligence Models with non-uniform Labels
Artificial intelligence (AI) methods are revolutionizing medical image
analysis. However, robust AI models require large multi-site datasets for
training. While multiple stakeholders have provided publicly available
datasets, the ways in which these data are labeled differ widely. For example,
one dataset of chest radiographs might contain labels denoting the presence of
metastases in the lung, while another dataset of chest radiograph might focus
on the presence of pneumonia. With conventional approaches, these data cannot
be used together to train a single AI model. We propose a new framework that we
call flexible federated learning (FFL) for collaborative training on such data.
Using publicly available data of 695,000 chest radiographs from five
institutions - each with differing labels - we demonstrate that large and
heterogeneously labeled datasets can be used to train one big AI model with
this framework. We find that models trained with FFL are superior to models
that are trained on matching annotations only. This may pave the way for
training of truly large-scale AI models that make efficient use of all existing
data.Comment: 2 figures, 3 tables, 5 supplementary table
Fibroglandular Tissue Segmentation in Breast MRI using Vision Transformers -- A multi-institutional evaluation
Accurate and automatic segmentation of fibroglandular tissue in breast MRI
screening is essential for the quantification of breast density and background
parenchymal enhancement. In this retrospective study, we developed and
evaluated a transformer-based neural network for breast segmentation (TraBS) in
multi-institutional MRI data, and compared its performance to the well
established convolutional neural network nnUNet. TraBS and nnUNet were trained
and tested on 200 internal and 40 external breast MRI examinations using manual
segmentations generated by experienced human readers. Segmentation performance
was assessed in terms of the Dice score and the average symmetric surface
distance. The Dice score for nnUNet was lower than for TraBS on the internal
testset (0.9090.069 versus 0.9160.067, P<0.001) and on the external
testset (0.8240.144 versus 0.8640.081, P=0.004). Moreover, the
average symmetric surface distance was higher (=worse) for nnUNet than for
TraBS on the internal (0.6572.856 versus 0.5482.195, P=0.001) and on
the external testset (0.7270.620 versus 0.5840.413, P=0.03). Our
study demonstrates that transformer-based networks improve the quality of
fibroglandular tissue segmentation in breast MRI compared to
convolutional-based models like nnUNet. These findings might help to enhance
the accuracy of breast density and parenchymal enhancement quantification in
breast MRI screening
Enhancing domain generalization in the AI-based analysis of chest radiographs with federated learning
Abstract Developing robust artificial intelligence (AI) models that generalize well to unseen datasets is challenging and usually requires large and variable datasets, preferably from multiple institutions. In federated learning (FL), a model is trained collaboratively at numerous sites that hold local datasets without exchanging them. So far, the impact of training strategy, i.e., local versus collaborative, on the diagnostic on-domain and off-domain performance of AI models interpreting chest radiographs has not been assessed. Consequently, using 610,000 chest radiographs from five institutions across the globe, we assessed diagnostic performance as a function of training strategy (i.e., local vs. collaborative), network architecture (i.e., convolutional vs. transformer-based), single versus cross-institutional performance (i.e., on-domain vs. off-domain), imaging finding (i.e., cardiomegaly, pleural effusion, pneumonia, atelectasis, consolidation, pneumothorax, and no abnormality), dataset size (i.e., from n = 18,000 to 213,921 radiographs), and dataset diversity. Large datasets not only showed minimal performance gains with FL but, in some instances, even exhibited decreases. In contrast, smaller datasets revealed marked improvements. Thus, on-domain performance was mainly driven by training data size. However, off-domain performance leaned more on training diversity. When trained collaboratively across diverse external institutions, AI models consistently surpassed models trained locally for off-domain tasks, emphasizing FL’s potential in leveraging data diversity. In conclusion, FL can bolster diagnostic privacy, reproducibility, and off-domain reliability of AI models and, potentially, optimize healthcare outcomes
The effect of speech pathology on automatic speaker verification: a large-scale study
Abstract Navigating the challenges of data-driven speech processing, one of the primary hurdles is accessing reliable pathological speech data. While public datasets appear to offer solutions, they come with inherent risks of potential unintended exposure of patient health information via re-identification attacks. Using a comprehensive real-world pathological speech corpus, with over n = 3800 test subjects spanning various age groups and speech disorders, we employed a deep-learning-driven automatic speaker verification (ASV) approach. This resulted in a notable mean equal error rate (EER) of 0.89 ± 0.06 % , outstripping traditional benchmarks. Our comprehensive assessments demonstrate that pathological speech overall faces heightened privacy breach risks compared to healthy speech. Specifically, adults with dysphonia are at heightened re-identification risks, whereas conditions like dysarthria yield results comparable to those of healthy speakers. Crucially, speech intelligibility does not influence the ASV system’s performance metrics. In pediatric cases, particularly those with cleft lip and palate, the recording environment plays a decisive role in re-identification. Merging data across pathological types led to a marked EER decrease, suggesting the potential benefits of pathological diversity in ASV, accompanied by a logarithmic boost in ASV effectiveness. In essence, this research sheds light on the dynamics between pathological speech and speaker verification, emphasizing its crucial role in safeguarding patient confidentiality in our increasingly digitized healthcare era
A multimodal comparison of latent denoising diffusion probabilistic models and generative adversarial networks for medical image synthesis
Abstract Although generative adversarial networks (GANs) can produce large datasets, their limited diversity and fidelity have been recently addressed by denoising diffusion probabilistic models, which have demonstrated superiority in natural image synthesis. In this study, we introduce Medfusion, a conditional latent DDPM designed for medical image generation, and evaluate its performance against GANs, which currently represent the state-of-the-art. Medfusion was trained and compared with StyleGAN-3 using fundoscopy images from the AIROGS dataset, radiographs from the CheXpert dataset, and histopathology images from the CRCDX dataset. Based on previous studies, Progressively Growing GAN (ProGAN) and Conditional GAN (cGAN) were used as additional baselines on the CheXpert and CRCDX datasets, respectively. Medfusion exceeded GANs in terms of diversity (recall), achieving better scores of 0.40 compared to 0.19 in the AIROGS dataset, 0.41 compared to 0.02 (cGAN) and 0.24 (StyleGAN-3) in the CRMDX dataset, and 0.32 compared to 0.17 (ProGAN) and 0.08 (StyleGAN-3) in the CheXpert dataset. Furthermore, Medfusion exhibited equal or higher fidelity (precision) across all three datasets. Our study shows that Medfusion constitutes a promising alternative to GAN-based models for generating high-quality medical images, leading to improved diversity and less artifacts in the generated images